FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem

نویسندگان

  • Elsa Bernard
  • Laurent Jacob
  • Julien Mairal
  • Jean-Philippe Vert
چکیده

FlipFlop implements a fast method for de novo transcript discovery and abundance estimation from RNA-Seq data. It differs from Cufflinks by simultaneously performing the transcript and quantitation tasks using a penalized maximum likelihood approach, which leads to improved precision/recall. Other softwares taking this approach have an exponential complexity in the number of exons in the gene. We use a novel algorithm based on network flow formalism, which gives us a polynomial runtime. In practice, FlipFlop was shown to outperform penalized maximum likelihood based softwares in terms of speed and to perform transcript discovery in less than 1/2 second even for large genes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

MOTIVATION Several state-of-the-art methods for isoform identification and quantification are based on [Formula: see text]-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the [Formula: see text]-penalty are either restricted to...

متن کامل

BIRS/Banff 15w5142 - Statistical and Computational Challenges In Bridging Functional Genomics, Epigenomics, Molecular QTLs, and Disease Genetics

Monday 9:15am-9:50am Laurent Jacob Efficient RNA isoform identification and quantification from RNA-Seq data with network flows Several state-of-the-art methods for isoform identification and quantification are based on l1regularized regression, such as the Lasso. However, explicitly listing the possibly exponentially large set of candidate transcripts is intractable for genes with many exons. ...

متن کامل

Differenced-Based Double Shrinking in Partial Linear Models

Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can ...

متن کامل

Greedy algorithms for prediction

In many prediction problems, it is not uncommon that the number of variables used to construct a forecast is of the same order of magnitude as the sample size, if not larger. We then face the problem of constructing a prediction in the presence of potentially large estimation error. Control of the estimation error is either achieved by selecting variables or combining all the variables in some ...

متن کامل

Pivotal estimation via square-root Lasso in nonparametric regression

We propose a self-tuning √ Lasso method that simultaneously resolves three important practical problems in high-dimensional regression analysis, namely it handles the unknown scale, heteroscedasticity and (drastic) non-Gaussianity of the noise. In addition, our analysis allows for badly behaved designs, for example, perfectly collinear regressors, and generates sharp bounds even in extreme case...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013